{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Install LlamaIndex and packages"
      ],
      "metadata": {
        "id": "fbpAVlRKSdrY"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "vjL1s6JomjXf"
      },
      "outputs": [],
      "source": [
        "!pip install llama-index llama-index-tools-tavily-research -q"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!llamaindex-cli download-llamapack CorrectiveRAGPack --download-dir ./corrective_rag_pack -q"
      ],
      "metadata": {
        "collapsed": true,
        "id": "eHJQMRw1mv1U"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "#or\n",
        "# from llama_index.core.llama_pack import download_llama_pack\n",
        "\n",
        "# # download and install dependencies\n",
        "# CorrectiveRAGPack = download_llama_pack(\n",
        "#     \"CorrectiveRAGPack\", \"./corrective_rag_pack\"\n",
        "# )"
      ],
      "metadata": {
        "id": "8NcnJ4HDnEJu"
      },
      "execution_count": 29,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Get API keys"
      ],
      "metadata": {
        "id": "Yki3t5NUSgnP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import userdata\n",
        "import os\n",
        "tavily_ai_api_key = userdata.get('TAVILY_API_KEY')\n",
        "LLAMAPARSE_API_KEY = userdata.get('LLAMAPARSE_API_KEY')\n",
        "os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')"
      ],
      "metadata": {
        "id": "OT3TrXILnWVC"
      },
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# LlamaParse"
      ],
      "metadata": {
        "id": "S5MtpUobSjo_"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "without gpt-4o : gpt4o_mode=False"
      ],
      "metadata": {
        "id": "mXax-w-yZM6s"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# parser = LlamaParse(api_key=LLAMAPARSE_API_KEY, gpt4o_mode=True,\n",
        "#     result_type=\"markdown\"  # \"markdown\" and \"text\" are available\n",
        "# )"
      ],
      "metadata": {
        "id": "b6A7QH6tZSM6"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')\n",
        "\n",
        "path='/content/drive/MyDrive/000_online_course/RAG/data/'"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jQpsrhLXoWoM",
        "outputId": "56ccde0d-adff-451e-e54f-4a8f456de7a6"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "path_file=path+'jpmorgan_annualreport-2022-splitted.pdf' #local path\n",
        "\n",
        "from llama_parse import LlamaParse\n",
        "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
        "import nest_asyncio;\n",
        "nest_asyncio.apply()\n",
        "\n",
        "# set up parser\n",
        "parser = LlamaParse(api_key=LLAMAPARSE_API_KEY,\n",
        "    result_type=\"markdown\"  # \"markdown\" and \"text\" are available\n",
        ")\n",
        "\n",
        "# use SimpleDirectoryReader to parse our file\n",
        "file_extractor = {\".pdf\": parser}\n",
        "documents = SimpleDirectoryReader(input_files=[path_file], file_extractor=file_extractor).load_data()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6LYKiNV9on38",
        "outputId": "1d3bbffe-f7ca-4933-ed23-0c98a4c98d50"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Started parsing the file under job_id 1d6ed45d-26ea-4e9b-8901-ee4f3f372a03\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Some Thoughts"
      ],
      "metadata": {
        "id": "39Vg3UDAbynL"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "*   As you can see the chart part is not well parsed, because I didn't use the GPT-4o in LlamaParse.\n",
        "*   I dind't in purpose to see if the CRAG can collect the adequate info from the web search."
      ],
      "metadata": {
        "id": "tgICHp3cbsjz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(documents[0].text)\n",
        "#As you can see the chart part is not well parsed, because I didn't use the GPT-4o in LlamaParse.\n",
        "#I dind't in purpose to see if the CRAG can collect the adequate info from the web search."
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dkprxrR3XI3j",
        "outputId": "142ff516-3bee-4a9c-8505-bf86f30dc20c"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "# Earnings, Diluted Earnings per Share and Return on Tangible Common Equity\n",
            "\n",
            "$48.3 2004–2022 ($ in billions, except per share and ratio data)\n",
            "\n",
            "| |Reported|Reported|Reported|Excluding reserve release/build|Excluding reserve release/build|Excluding reserve release/build|\n",
            "|---|---|---|\n",
            "| |2020|2021|2022|2020|2021|2022|\n",
            "|Net income ($B)|$29.1|$48.3|$37.7|$38.4|$39.1|$40.4|\n",
            "|Diluted EPS ($)|$8.88|$15.36|$12.09|$11.87|$12.35|$12.99|\n",
            "|ROTCE|14.4%|23.0%|17.7%|19.3%|18.5%|19.1%|\n",
            "\n",
            "Adjusted net income2 $32.5 $15.36 $10.72 $12.09 $26.9 $29.1 $9.00 $24.4 $24.7 $24.4 $21.7 $8.88 23% $21.3 18% $19.0 $17.4 $17.9 $6.00 $6.19 19% $6.31 $15.4 11% 17% $14.4 15% $14.4 15% 15% 13% $5.19 $5.29 12% $11.7 10% $4.48 $4.34 $8.5 $4.00 $4.33 6% $3.96 $2.35 $5.6 $2.26 $4.5 $1.35 $1.52\n",
            "\n",
            "2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022\n",
            "\n",
            "Net income Diluted earnings per share (EPS) Return on tangible common equity (ROTCE) Adjusted ROTCE2 ROTCE excluding reserve release/build\n",
            "\n",
            "Firmwide results excluding reserve release/build are non-GAAP financial measures. Adjusted net income excludes $2.4 billion from net income in 2017 as a result of the enactment of the Tax Cuts and Jobs Act. ROTCE was 13.6% for 2017 and 18.5% for 2021\n",
            "\n",
            "GAAP = Generally accepted accounting principles ROTCE = Return on tangible common equity\n",
            "\n",
            "An important note to describe why we are showing the table above: The loan loss reserve accounting rules — which are life-of-loan estimated losses based upon probability-based economic scenarios — generate huge swings in earnings that can be unrelated to actual credit performance. This was particularly true for the COVID-19 years when, during the first six months of the pandemic, we built approximately $16 billion in reserves. Then in the next six quarters, we released essentially the equivalent number. We did so only because the scenarios used to estimate future credit losses changed dramatically.\n",
            "\n",
            "The table above shows reported net income, with and without loan loss reserve changes. Throughout this period, the credit portfolio was healthy, and charge-offs remained below pre-pandemic levels. Either way, the company had strong absolute and relative performance.\n",
            "---\n",
            "# Tangible Book Value1 and Average Stock Price per Share\n",
            "\n",
            "| |2004|2005|2006|2007|2008|2009|2010|2011|2012|2013|2014|2015|2016|2017|2018|2019|2020|2021|2022|\n",
            "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
            "|Tangible book value|$38.70|$36.07|$39.83|$40.36|$39.36|$39.22|$48.13|$51.44| | | | | | | | | | | |\n",
            "|Average stock price|$15.35|$16.45| | | | | | | | | | | | | | | | | |\n",
            "\n",
            "1 9% compound annual growth rate since 2004.\n",
            "\n",
            "# Stock total return analysis\n",
            "\n",
            "| |Bank One|S&P 500 Index|S&P Financials Index|\n",
            "|---|---|---|---|\n",
            "|Performance since becoming CEO of Bank One (3/27/2000—12/31/2022)1| | | |\n",
            "|Compounded annual gain|11.3%|6.1%|4.6%|\n",
            "|Overall gain|1,047.8%|287.7%|176.1%|\n",
            "| |JPMorgan Chase & Co.|S&P 500 Index|S&P Financials Index|\n",
            "|Performance since the Bank One and JPMorgan Chase & Co. merger (7/1/2004—12/31/2022)| | | |\n",
            "|Compounded annual gain|9.9%|8.9%|4.4%|\n",
            "|Overall gain|471.6%|386.8%|120.0%|\n",
            "|Performance for the period ended December 31, 2022| | | |\n",
            "|Compounded annual gain/(loss)|One year|(12.6)%|(18.1)%|(10.5)%|\n",
            "| |Five years|7.7%|9.4%|6.4%|\n",
            "| |Ten years|14.9%|12.6%|12.1%|\n",
            "\n",
            "This chart shows actual returns of the stock, with dividends reinvested, for heritage shareholders of Bank One and JPMorgan Chase & Co. vs. the Standard & Poor’s 500 Index (S&P 500 Index) and the Standard & Poor’s Financials Index (S&P Financials Index).\n",
            "\n",
            "1 On March 27, 2000, Jamie Dimon was hired as CEO of Bank One.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Modifying the base CorrectiveRAGPack"
      ],
      "metadata": {
        "id": "Tr2Huh47TPFB"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "I made small modification to see some intermediate results,  like:\n",
        "\n",
        "*   relevancy_score (the score the LLM gives to each retrieved document). This score is designed by this template prompt: DEFAULT_RELEVANCY_PROMPT_TEMPLATE\n",
        "\n",
        "  - Assign a binary score to indicate the document's relevance.\n",
        "  - Use 'yes' if the document is relevant to the question, or 'no' if it is not.\n",
        "\n",
        "*   relevant_text : Only text with \"yes\" score\n",
        "\n",
        "*   search_text ( coming from tavily)\n",
        "*   query transformed by the LLM to enhance its search performance\n",
        "*   I modified to use GPT-3.5-Turbo instead of GPT-4 specified by default."
      ],
      "metadata": {
        "id": "R2Yb6RNCZ6Rq"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Loading the modified CorrectiveRAGPack"
      ],
      "metadata": {
        "id": "-Gg-j-yhcs_7"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from corrective_rag_pack.llama_index.packs.corrective_rag.base import CorrectiveRAGPack\n",
        "corrective_rag = CorrectiveRAGPack(documents, tavily_ai_api_key)"
      ],
      "metadata": {
        "id": "-DgtfGD7wGCy"
      },
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Question 1"
      ],
      "metadata": {
        "id": "2Vg8PesjftfL"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Both values are in the document:"
      ],
      "metadata": {
        "id": "-v0vfq7Rgf5D"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Thoughts: **\n",
        "Values well extracted and the calculation is correct"
      ],
      "metadata": {
        "id": "5ZX78XFQkWO8"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "prompt = \"What is the change in net income from 2021 to 2022?\"\n",
        "response = corrective_rag.run(prompt, similarity_top_k=2)\n",
        "\n",
        "# ***************relevancy_results***************\n",
        "# ['yes', 'no']\n",
        "\n",
        "\n",
        "\n",
        "# ***************relevant_text***************\n",
        "# # Earnings, Diluted Earnings per Share and Return on Tangible Common Equity\n",
        "\n",
        "# $48.3 2004–2022 ($ in billions, except per share and ratio data)\n",
        "\n",
        "# *********transformed_query_str for search_result********\n",
        "# Net income difference between 2021 and 2022\n",
        "\n",
        "\n",
        "\n",
        "# ***************search_text (from Tavily): first [:500]***************\n",
        "# The Internal Revenue Service has released 2022 inflation adjustments for federal income tax brackets"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4231Du24abi7",
        "outputId": "1fc20c5c-3e8f-4c2f-ea99-6d526e23ada1"
      },
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "***************relevancy_results***************\n",
            "['yes', 'no']\n",
            "\n",
            "\n",
            "\n",
            "***************relevant_text***************\n",
            "# Earnings, Diluted Earnings per Share and Return on Tangible Common Equity\n",
            "\n",
            "$48.3 2004–2022 ($ in billions, except per share and ratio data)\n",
            "\n",
            "| |Reported|Reported|Reported|Excluding reserve release/build|Excluding reserve release/build|Excluding reserve release/build|\n",
            "|---|---|---|\n",
            "| |2020|2021|2022|2020|2021|2022|\n",
            "|Net income ($B)|$29.1|$48.3|$37.7|$38.4|$39.1|$40.4|\n",
            "|Diluted EPS ($)|$8.88|$15.36|$12.09|$11.87|$12.35|$12.99|\n",
            "|ROTCE|14.4%|23.0%|17.7%|19.3%|18.5%|19.1%|\n",
            "\n",
            "Adjusted net income2 $32.5 $15.36 $10.72 $12.09 $26.9 $29.1 $9.00 $24.4 $24.7 $24.4 $21.7 $8.88 23% $21.3 18% $19.0 $17.4 $17.9 $6.00 $6.19 19% $6.31 $15.4 11% 17% $14.4 15% $14.4 15% 15% 13% $5.19 $5.29 12% $11.7 10% $4.48 $4.34 $8.5 $4.00 $4.33 6% $3.96 $2.35 $5.6 $2.26 $4.5 $1.35 $1.52\n",
            "\n",
            "2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022\n",
            "\n",
            "Net income Diluted earnings per share (EPS) Return on tangible common equity (ROTCE) Adjusted ROTCE2 ROTCE excluding reserve release/build\n",
            "\n",
            "Firmwide results excluding reserve release/build are non-GAAP financial measures. Adjusted net income excludes $2.4 billion from net income in 2017 as a result of the enactment of the Tax Cuts and Jobs Act. ROTCE was 13.6% for 2017 and 18.5% for 2021\n",
            "\n",
            "GAAP = Generally accepted accounting principles ROTCE = Return on tangible common equity\n",
            "\n",
            "An important note to describe why we are showing the table above: The loan loss reserve accounting rules — which are life-of-loan estimated losses based upon probability-based economic scenarios — generate huge swings in earnings that can be unrelated to actual credit performance. This was particularly true for the COVID-19 years when, during the first six months of the pandemic, we built approximately $16 billion in reserves. Then in the next six quarters, we released essentially the equivalent number. We did so only because the scenarios used to estimate future credit losses changed dramatically.\n",
            "\n",
            "The table above shows reported net income, with and without loan loss reserve changes. Throughout this period, the credit portfolio was healthy, and charge-offs remained below pre-pandemic levels. Either way, the company had strong absolute and relative performance.\n",
            "---\n",
            "# Tangible Book Value1 and Average Stock Price per Share\n",
            "\n",
            "| |2004|2005|2006|2007|2008|2009|2010|2011|2012|2013|2014|2015|2016|2017|2018|2019|2020|2021|2022|\n",
            "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
            "|Tangible book value|$38.70|$36.07|$39.83|$40.36|$39.36|$39.22|$48.13|$51.44| | | | | | | | | | | |\n",
            "|Average stock price|$15.35|$16.45| | | | | | | | | | | | | | | | | |\n",
            "\n",
            "1 9% compound annual growth rate since 2004.\n",
            "\n",
            "\n",
            "\n",
            "*********transformed_query_str for search_result********\n",
            "Net income difference between 2021 and 2022\n",
            "\n",
            "\n",
            "\n",
            "***************search_text (from Tavily): first [:500]***************\n",
            "The Internal Revenue Service has released 2022 inflation adjustments for federal income tax brackets, the standard deduction, and other parts of the tax code. See below for how these 2022 brackets compare to 2021 brackets. Importantly, the 2021 brackets are for income earned in 2021, which most people will file taxes on before April 15, 2022. The 2022 brackets are for income earned in 2022 ...\n",
            "Previous Versions\n",
            "2024 Tax Brackets\n",
            "2023 Tax Brackets\n",
            "2021 Tax Brackets\n",
            "2020 Tax Brackets\n",
            "2019 Tax Brac\n",
            "Using search_text : True\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(response.response)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "P_mvFajhfvah",
        "outputId": "99c97a86-e2ce-4c73-d1e1-22622cdd2a3e"
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The change in net income from 2021 to 2022 is a decrease of $10.6 billion.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Question 2"
      ],
      "metadata": {
        "id": "m6ZNwx25fytb"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The same here:\n",
        "\n",
        "**Thoughts:**\n",
        "Values well extracted and the calculation is correct\n",
        "\n"
      ],
      "metadata": {
        "id": "VZGnghYFglRq"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "prompt = \"Answer step-by-step to these questions: 1- What is the net income in 2021? 2-What is the net income in 2022? -What is the change in the net income from 2021 to 2022?\"\n",
        "response = corrective_rag.run(prompt, similarity_top_k=2)\n",
        "\n",
        "# ***************relevancy_results***************\n",
        "# ['yes', 'no']\n",
        "\n",
        "\n",
        "\n",
        "# ***************relevant_text***************\n",
        "\n",
        "# *********transformed_query_str for search_result********\n",
        "# Net income 2021 and 2022 comparison."
      ],
      "metadata": {
        "id": "H3jL1R-7aeWy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(response.response)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "67mFetHMf3w7",
        "outputId": "ac8f4f62-169a-4f4a-b8a2-6034d7e413f9"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "1. The net income in 2021 was $48.3 billion.\n",
            "2. The net income in 2022 was $37.7 billion.\n",
            "3. The change in net income from 2021 to 2022 was a decrease of $10.6 billion.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Question 3"
      ],
      "metadata": {
        "id": "Qn7cw2PVf9Hj"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "This information is not the report. So we expect the CRAG uses tavily search ."
      ],
      "metadata": {
        "id": "5z7q-1d3gnRy"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "The transformed query by the CRAG: ==> *\"Describe JPMorgan's activities.\"*"
      ],
      "metadata": {
        "id": "jloffB8Ng2Bz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "prompt = \"Give a description about JPMorgan activities?\"\n",
        "response = corrective_rag.run(prompt, similarity_top_k=2)\n",
        "response"
      ],
      "metadata": {
        "id": "7PG7Esupahei"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Thoughts**: I'm expecting more valuable information, howver I get a quick description:"
      ],
      "metadata": {
        "id": "0BohkU0nkGOT"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "#The answer not satisfasante:\n",
        "print(response.response)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "jWR4Zm84gQ6g",
        "outputId": "50f6581c-5bae-4775-adf4-c876b642f7fa"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "JPMorgan Chase & Co. is a leading global financial services firm with assets of $2.6 trillion and operations worldwide. The company serves millions of customers in the United States and many of the world's most prominent corporate, institutional, and government clients under its J.P. Morgan and Chase brands.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Question 4"
      ],
      "metadata": {
        "id": "GqnUQbd6g8qr"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "I asked for Net Income in 2019. This value is not accuratly parsed. So we expect the CRAG to have a look at it on the internet:"
      ],
      "metadata": {
        "id": "Jm9qWyNQg-VU"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Thoughts**\n",
        "Important to know: the quality of your question here is crucial. As the CRAG will transform your question to a better prompt to enhance its search perfomance, you need to give it all the important keywords:\n",
        "\n",
        "For example:\n",
        "*   You are attempted to ask this following question because you are chating with your llm:\n",
        "**\"What was the net income in 2019?\"**\n",
        "\n",
        "So the transformed query by the CRAG will be:\n",
        "**\"2019 net income\"**\n",
        "\n",
        "==> This when sent to tavily search, it will results in a poor quality answers!!\n",
        "\n",
        "*   However if you ask:\n",
        "\n",
        "**\"What was the net income in 2019 of JPMorgan Chase bank?\"**\n",
        "The transformer question will be: **\"JPMorgan Chase bank 2019 net income\"**\n",
        "=> This will result in a good answer."
      ],
      "metadata": {
        "id": "sBXzrjFthM47"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "prompt = \"What was the net income in 2019 of JPMorgan Chase bank?\"\n",
        "response = corrective_rag.run(prompt, similarity_top_k=3)\n",
        "response"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "G0CU3GQjv_Hj",
        "outputId": "59273b50-45fe-4c88-a40e-1a8dece457c1"
      },
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "***************relevancy_results***************\n",
            "['no', 'no']\n",
            "\n",
            "\n",
            "\n",
            "***************relevant_text***************\n",
            "\n",
            "\n",
            "\n",
            "\n",
            "*********transformed_query_str for search_result********\n",
            "JPMorgan Chase bank 2019 net income\n",
            "\n",
            "\n",
            "\n",
            "***************search_text (from Tavily): first [:500]***************\n",
            "Fiscal year is January-December. All values USD Millions. 2023 2022 2021 2020 2019 5-year trend; Interest Income: 170,588: 92,807: 57,864: 64,523: 84,040\n",
            "Get the detailed quarterly/annual income statement for JPMorgan Chase & Co. (JPM). Find out the revenue, expenses and profit or loss over the last fiscal year.\n",
            "JPMorgan Chase annual net income for 2023 was $47.76B, a 33.07% increase from 2022. JPMorgan Chase annual net income for 2022 was $35.892B, a 22.82% decline from 2021. JPMorgan Chase ann\n",
            "Using search_text : True\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Response(response='The net income in 2019 for JPMorgan Chase bank was $36.4 billion.', source_nodes=[NodeWithScore(node=TextNode(id_='afdf0ed4-fa1b-411e-b6c4-abe7ddc0c7d6', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ab63a31f-8ea7-459b-b504-311d4eae075d', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='e6ef3b5739a5163dd55bdf0c8a3c63be9aa763d8bdc15752c08295864008902f')}, text='Fiscal year is January-December. All values USD Millions. 2023 2022 2021 2020 2019 5-year trend; Interest Income: 170,588: 92,807: 57,864: 64,523: 84,040\\nGet the detailed quarterly/annual income statement for JPMorgan Chase & Co. (JPM). Find out the revenue, expenses and profit or loss over the last fiscal year.\\nJPMorgan Chase annual net income for 2023 was $47.76B, a 33.07% increase from 2022. JPMorgan Chase annual net income for 2022 was $35.892B, a 22.82% decline from 2021. JPMorgan Chase annual net income for 2021 was $46.503B, a 69.66% increase from 2020. JPMorgan Chase & Co. is one of the largest financial service firms in the world.\\n(in millions, except per share, ratio data and headcount) 2021 2020 2019 Selected income statement data Total net revenue(a) $ 121,649 $ 119,951 $ 115,720 Total noninterest expense 71,343 66,656 65,269 Pre-provision profit(b) 50,306 ... Whether looking back 10 years or since the JPMorgan Chase/Bank One merger in 2004, these investments have ...\\nfor JPMorgan Chase, with the firm generating record revenue and net income, as well as setting numerous other records across our lines of business. We earned $36.4 billion in net income on revenue1 of $118.7 billion, reflecting strong underlying performance across our businesses. We now have delivered', start_char_idx=1, end_char_idx=1298, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=None)], metadata={'afdf0ed4-fa1b-411e-b6c4-abe7ddc0c7d6': {}})"
            ]
          },
          "metadata": {},
          "execution_count": 19
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "response.response"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "id": "_T1_iHDIyWUU",
        "outputId": "1c0db6cf-c5b9-43fa-cba5-55e03de267dd"
      },
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'The net income in 2019 for JPMorgan Chase bank was $36.4 billion.'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Question 5"
      ],
      "metadata": {
        "id": "CP443oRai5uq"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "I asked 2 questions (the name of the firm and the same question than before):"
      ],
      "metadata": {
        "id": "20oXJCYajDNc"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Thoughts**:\n",
        "This kind multiple queries in the same results in the LLM hallucinating ==> Decompose the questions on simple ones.\n",
        "\n",
        "2019 Net Income in this answer is 48.3 instead of $36.4"
      ],
      "metadata": {
        "id": "c0peJ5EYjKwT"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "prompt = \"To which firm corrsponds the financianl report document? and What was the net income in 2019?\"\n",
        "response = corrective_rag.run(prompt, similarity_top_k=2)\n",
        "response"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "HXLRDNZev-9z",
        "outputId": "6ecf14c2-61e7-4c41-cea8-059c055a22a4"
      },
      "execution_count": 21,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "***************relevancy_results***************\n",
            "['no', 'yes']\n",
            "\n",
            "\n",
            "\n",
            "***************relevant_text***************\n",
            "# Earnings, Diluted Earnings per Share and Return on Tangible Common Equity\n",
            "\n",
            "$48.3 2004–2022 ($ in billions, except per share and ratio data)\n",
            "\n",
            "| |Reported|Reported|Reported|Excluding reserve release/build|Excluding reserve release/build|Excluding reserve release/build|\n",
            "|---|---|---|\n",
            "| |2020|2021|2022|2020|2021|2022|\n",
            "|Net income ($B)|$29.1|$48.3|$37.7|$38.4|$39.1|$40.4|\n",
            "|Diluted EPS ($)|$8.88|$15.36|$12.09|$11.87|$12.35|$12.99|\n",
            "|ROTCE|14.4%|23.0%|17.7%|19.3%|18.5%|19.1%|\n",
            "\n",
            "Adjusted net income2 $32.5 $15.36 $10.72 $12.09 $26.9 $29.1 $9.00 $24.4 $24.7 $24.4 $21.7 $8.88 23% $21.3 18% $19.0 $17.4 $17.9 $6.00 $6.19 19% $6.31 $15.4 11% 17% $14.4 15% $14.4 15% 15% 13% $5.19 $5.29 12% $11.7 10% $4.48 $4.34 $8.5 $4.00 $4.33 6% $3.96 $2.35 $5.6 $2.26 $4.5 $1.35 $1.52\n",
            "\n",
            "2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022\n",
            "\n",
            "Net income Diluted earnings per share (EPS) Return on tangible common equity (ROTCE) Adjusted ROTCE2 ROTCE excluding reserve release/build\n",
            "\n",
            "Firmwide results excluding reserve release/build are non-GAAP financial measures. Adjusted net income excludes $2.4 billion from net income in 2017 as a result of the enactment of the Tax Cuts and Jobs Act. ROTCE was 13.6% for 2017 and 18.5% for 2021\n",
            "\n",
            "GAAP = Generally accepted accounting principles ROTCE = Return on tangible common equity\n",
            "\n",
            "An important note to describe why we are showing the table above: The loan loss reserve accounting rules — which are life-of-loan estimated losses based upon probability-based economic scenarios — generate huge swings in earnings that can be unrelated to actual credit performance. This was particularly true for the COVID-19 years when, during the first six months of the pandemic, we built approximately $16 billion in reserves. Then in the next six quarters, we released essentially the equivalent number. We did so only because the scenarios used to estimate future credit losses changed dramatically.\n",
            "\n",
            "The table above shows reported net income, with and without loan loss reserve changes. Throughout this period, the credit portfolio was healthy, and charge-offs remained below pre-pandemic levels. Either way, the company had strong absolute and relative performance.\n",
            "---\n",
            "# Tangible Book Value1 and Average Stock Price per Share\n",
            "\n",
            "| |2004|2005|2006|2007|2008|2009|2010|2011|2012|2013|2014|2015|2016|2017|2018|2019|2020|2021|2022|\n",
            "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
            "|Tangible book value|$38.70|$36.07|$39.83|$40.36|$39.36|$39.22|$48.13|$51.44| | | | | | | | | | | |\n",
            "|Average stock price|$15.35|$16.45| | | | | | | | | | | | | | | | | |\n",
            "\n",
            "1 9% compound annual growth rate since 2004.\n",
            "\n",
            "\n",
            "\n",
            "*********transformed_query_str for search_result********\n",
            "\"Which firm issued the financial report document and what was their net income in 2019?\"\n",
            "\n",
            "\n",
            "\n",
            "***************search_text (from Tavily): first [:500]***************\n",
            "\n",
            "Using search_text : False\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Response(response='The financial report document corresponds to a banking or financial services firm. The net income in 2019 was $48.3 billion.', source_nodes=[NodeWithScore(node=TextNode(id_='2f5be844-c8e5-4efd-be85-1c6d78804263', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='46744819-9a99-43d4-9c30-2f54aebd7497', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='cab101ff79734acc999d802222ca9dcc863b2351bd6f0effd9a1b7e71d29b497')}, text='# Earnings, Diluted Earnings per Share and Return on Tangible Common Equity\\n\\n$48.3 2004–2022 ($ in billions, except per share and ratio data)\\n\\n| |Reported|Reported|Reported|Excluding reserve release/build|Excluding reserve release/build|Excluding reserve release/build|\\n|---|---|---|\\n| |2020|2021|2022|2020|2021|2022|\\n|Net income ($B)|$29.1|$48.3|$37.7|$38.4|$39.1|$40.4|\\n|Diluted EPS ($)|$8.88|$15.36|$12.09|$11.87|$12.35|$12.99|\\n|ROTCE|14.4%|23.0%|17.7%|19.3%|18.5%|19.1%|\\n\\nAdjusted net income2 $32.5 $15.36 $10.72 $12.09 $26.9 $29.1 $9.00 $24.4 $24.7 $24.4 $21.7 $8.88 23% $21.3 18% $19.0 $17.4 $17.9 $6.00 $6.19 19% $6.31 $15.4 11% 17% $14.4 15% $14.4 15% 15% 13% $5.19 $5.29 12% $11.7 10% $4.48 $4.34 $8.5 $4.00 $4.33 6% $3.96 $2.35 $5.6 $2.26 $4.5 $1.35 $1.52\\n\\n2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022\\n\\nNet income Diluted earnings per share (EPS) Return on tangible common equity (ROTCE) Adjusted ROTCE2 ROTCE excluding reserve release/build\\n\\nFirmwide results excluding reserve release/build are non-GAAP financial measures. Adjusted net income excludes $2.4 billion from net income in 2017 as a result of the enactment of the Tax Cuts and Jobs Act. ROTCE was 13.6% for 2017 and 18.5% for 2021\\n\\nGAAP = Generally accepted accounting principles ROTCE = Return on tangible common equity\\n\\nAn important note to describe why we are showing the table above: The loan loss reserve accounting rules — which are life-of-loan estimated losses based upon probability-based economic scenarios — generate huge swings in earnings that can be unrelated to actual credit performance. This was particularly true for the COVID-19 years when, during the first six months of the pandemic, we built approximately $16 billion in reserves. Then in the next six quarters, we released essentially the equivalent number. We did so only because the scenarios used to estimate future credit losses changed dramatically.\\n\\nThe table above shows reported net income, with and without loan loss reserve changes. Throughout this period, the credit portfolio was healthy, and charge-offs remained below pre-pandemic levels. Either way, the company had strong absolute and relative performance.\\n---\\n# Tangible Book Value1 and Average Stock Price per Share\\n\\n| |2004|2005|2006|2007|2008|2009|2010|2011|2012|2013|2014|2015|2016|2017|2018|2019|2020|2021|2022|\\n|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\\n|Tangible book value|$38.70|$36.07|$39.83|$40.36|$39.36|$39.22|$48.13|$51.44| | | | | | | | | | | |\\n|Average stock price|$15.35|$16.45| | | | | | | | | | | | | | | | | |\\n\\n1 9% compound annual growth rate since 2004.', start_char_idx=0, end_char_idx=2677, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=None)], metadata={'2f5be844-c8e5-4efd-be85-1c6d78804263': {}})"
            ]
          },
          "metadata": {},
          "execution_count": 21
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "response.response"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "id": "VBi--C4nBin4",
        "outputId": "403aa1e7-9ee8-4ae3-de8d-9cc111fd110c"
      },
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'The financial report document corresponds to a banking or financial services firm. The net income in 2019 was $48.3 billion.'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 22
        }
      ]
    }
  ]
}